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1.  Objective 


Robotic  swarms  consist  of  a  large  number  (potentially  thousands)  of  small,  relatively  simple 
robots  capable  of  autonomous  travel  and  operation  as  a  unit  on  land,  sea,  and  air.  Swanns  can 
implement  simplistic  rules  to  accomplish  a  desired  collective  behavior  that  involves  interaction 
between  individual  members  as  well  as  the  behavior  of  the  entire  swarm  (7).  These  behaviors  can 
be  combined  to  enable  swarm  members  to  perfonn  critical  Army  tasks  such  as  accompanying 
convoys,  mapping  battlefields,  and  clearing  minefields. 

One  potential  problem  with  robotic  swarms  is  that  they  may  become  unstable  when  members  are 
disturbed  by  unexpected  changes  in  weather  or  terrain,  degradation,  attrition,  or  enemy  actions, 
which  may  negatively  impact  or  tenninate  the  swarm’s  mission.  Soldier-swarm  interaction  is  a 
critical  aspect  of  swarm  control,  especially  in  disrupted  or  degraded  conditions:  The  Soldier  must 
be  kept  cognizant  of  swann  operations  through  an  interface  that  allows  him  or  her  to  monitor 
status  and/or  institute  corrective  actions.  The  growing  body  of  human-robot  interaction  (HRI) 
research  still  has  little  to  say  about  the  design  of  Soldier-swarm  interface  displays  and  controls. 

The  objective  of  the  first  year  of  this  two-year  effort  was  to  design  algorithms  and  devices  that 
allow  Soldiers  to  efficiently  interact  with  a  robotic  swann  participating  in  a  representative 
convoy  mission.  In  Year  1  (FY08),  this  objective  was  successfully  fulfilled  by  (1)  providing 
metacognition  algorithms  that  enable  swann  members  to  efficiently  monitor  changes  in  swarm 
status  as  they  execute  their  mission  (accompanying  a  manned  convoy  and  searching  for 
improvised  explosive  devices)  and  (2)  providing  display  concepts  that  can  efficiently  and 
effectively  communicate  swann  status  to  Soldiers  in  challenging  battlefield  environments.  The 
objectives  of  the  Year  2  (FY09)  research  were  to  (1)  extend  the  metacognition  algorithms, 
successfully  developed  in  Year  1,  to  enable  swann  members  to  efficiently  monitor  changes  in 
swarm  status  in  novel,  more  complex  mission  scenarios;  (2)  develop  novel  multimodal  (speech 
and  touch)  control  interfaces  that  would  allow  the  Soldier  to  control  or  modify  the  swann’s 
mission;  and  (3)  develop  control  measurement  methodologies  for  the  swarm  control  interface, 
taking  into  account  increased  Year  2  swann  complexity. 


2.  Approach 


In  Year  2,  we  expanded  our  focus  to  more  complex  swarm  and  mission  characteristics,  designed 
and  developed  Soldier  control  interfaces,  and  evaluated  the  expanded  swarm  capabilities  and  the 
Soldier-swann  control  interface.  We  achieved  these  efforts  in  a  cross-Directorate  cooperative 
effort,  exploiting  U.S.  Anny  Research  Laboratory  (ARL)  expertise  in  the  key  areas  of  modeling, 
simulation,  and  human  factors  engineering  to  attempt  to  solve  a  future  Army  problem. 
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2.1  Swarm  and  Mission  Characteristics 

In  Year  2,  we  again  used  a  simulated  swarm  because  it  best  allowed  an  analysis  of  swarm  size 
(number  of  members)  and  type  (ground,  air,  or  micro  systems)  required  for  the  mission,  as  well 
as  the  examination  of  different  Soldier-swarm  interface  technologies.  We  continued  to  focus  on 
convoy  missions,  but  we  increased  the  complexity  of  both  the  swarm  and  the  mission  scenarios. 
In  Year  2,  the  swarm  was  split  into  heterogeneous  “sentry”  team  and  “explorer”  team  members 
to  achieve  better  control.  The  sentry  team  was  required  to  remain  with  the  convoy,  while  the 
explorer  team  accompanied  the  convoy  but  was  also  allowed  to  leave  the  convoy  to  explore 
nearby  “hot  spots”  (terrain  features  of  interest).  The  hot  spots  were  made  more  realistic  by 
introducing  a  detectable  improvised  explosive  device  (IED).  Swarm  members  searched  for  IEDs 
using  a  notional  detector.  If  an  IED  was  found,  a  swarm  member  could  sacrifice  itself  to  destroy 
it.  To  improve  the  realism  of  the  scenario,  we  introduced  an  attrition  model  in  which  swarm 
members  were  destroyed,  with  attrition  events  controlled  by  a  Poisson  random  variable.  Task 
priorities  and  metrics  were  established  to  express  overall  swarm  status  by  defining  how 
information  from  individual  swarm  members  was  prioritized  and  combined  at  the  swarm  level. 

To  support  the  overall  Year  2  goals,  we  used  a  potential  field  approach  (also  used  in  Year  1)  in 
which  the  controlling  field  is  a  nonlinear  sum  of  simpler  fields,  each  of  which  provides  control 
for  a  specific  behavior  or  task.  The  fields  for  the  sentry  and  explorer  teams  used  the  same  set  of 
simpler  fields  weighted  according  to  the  priorities  of  each  team.  This  approach  was  chosen 
because  it  scales  easily  to  large  heterogeneous  swanns  and  allows  a  Soldier/user  to  dynamically 
alter  swarm  behavior  to  meet  mission  needs  by  adjusting  field  parameters. 

We  introduced  metacognition  into  the  swarm  system  by  developing  a  set  of  swarm  performance 
measures  related  to  the  convoy  mission.  The  first  measure  evaluated  swarm  coverage  of  the 
convoy.  The  swann  control  algorithm  attracts  swarm  members  to  an  elliptical  ring  surrounding 
the  convoy.  Convoy  coverage  was  considered  adequate  if  there  were  no  large  gaps  in  the  ring. 
Two  methods  were  used  to  measure  coverage.  The  first,  most  precise  measure  computed  the 
maximum  neighbor-to-neighbor  arc  length  around  the  ellipse.  Although  computing 
trigonometric  functions  for  each  robot  in  the  ellipse  was  somewhat  demanding,  this  method 
made  it  possible  to  find  and  track  the  size  of  the  gap  as  a  function  of  time.  A  second, 
computationally  simple  method  used  minimal  bounding  boxes  for  both  the  swarm  and  the 
convoy.  The  convoy  was  considered  “covered”  if  its  bounding  box  was  fully  contained  within 
the  bounding  box  for  the  swarm.  Two  additional  measures  used  in  this  study  were  the  number  of 
swarm  explorers  and  sentries.  These  counts  were  used  to  measure  the  viability  of  the  teams. 

As  the  swarm  conducted  its  mission,  it  used  the  above  perfonnance  measures  to  modify  the 
behavior  of  the  swarm  members.  We  designated  convoy  coverage  as  the  most  important  task. 
The  simplest  corrective  measure  the  swarm  could  implement  was  to  alter  the  speed  of  some  of  its 
members.  This  action  ensured  that  explorers  returning  from  exploration  tasks  would  rejoin  the 
convoy  quickly  and  also  enabled  the  swann  to  control  the  neighbor-to-neighbor  arc  length  for 
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members  around  the  convoy.  Since  we  allowed  attrition  (loss  of  members)  in  the  Year  2 
scenarios,  altering  the  speed  did  not  ensure  coverage  for  the  convoy.  Thus,  another  corrective 
action  the  swarm  could  employ  was  to  change  the  team  designation  for  individual  members.  In 
our  scenarios,  the  swarm  monitored  the  number  of  sentry  robots.  If  the  number  fell  below  a 
critical  value  (an  arbitrary  value  of  10),  the  swarm  recruited  new  sentries  from  the  explorer  team. 
It  was  also  possible  to  convert  sentries  into  explorers,  using  the  number  of  hot  spots  to  determine 
the  critical  number  of  explorers. 

2.2  Soldier  Interface  Characteristics 

Our  Year  2  goal  was  to  design  an  efficient  Soldier-swarm  map  control  interface  with  which  the 
Soldier  could  use  combined  multimodal  commands  to  emplace  different  types  of  objects  (i.e., 
targets,  waypoints,  and/or  hot  spots)  at  different  locations  on  an  interactive  map  display  to  allow 
Soldier  control  of  swarm  movement.  Figure  1  shows  the  interactive  map  used  in  the  Soldier 
interface.  Roads  are  shown  in  black,  buildings  in  green,  and  the  swarm  is  shown  as  a  red  circle  in 
the  upper  right  hand  corner  of  the  map.  We  used  multimodal  (speech  and  touch)  controls 
because  research  suggested  that  when  used  together  in  a  sequence  (combined),  speech  and  touch 
input  may  be  particularly  effective  for  an  interactive  map  control  interface  (2-5). 


Figure  1.  Soldier  interface  map.  The  roads  are  black  and 
the  buildings  are  green. 

In  designing  and  evaluating  the  Soldier  interface,  we  explored  several  issues.  These  included  the 
need  for  measurement  of  time  between  multimodal  control  actions,  relevant  touch  screen  targets, 
and  relevant  speech  commands.  These  issues  are  described  below. 

The  first  issue  is  the  measurement  of  time  between  multimodal  control  actions.  The  time 
between  the  onset  of  a  first  control  action  (e.g.,  a  speech  command)  relative  to  the  onset  of  a 
second,  dependent  control  action  (e.g.,  a  consequent  touch  command)  can  be  defined 
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operationally  as  temporal  binding.  Knowledge  of  temporal  binding  is  important  because  it  can 
support  a  smoother  fusion  of  commands  to  the  system  and  reduce  system  error.  Although  Oviatt 
(■ 4 )  suggested  that  identifying  time  between  control  actions  is  important,  neither  she  nor  any 
other  researcher  actually  measured  inter-command  temporal  binding. 

The  second  issue  is  the  need  to  define  input  difficulty.  In  considering  motor  actions  with  touch 
screen  displays,  neither  Oviatt  (2-4)  nor  any  other  researcher  explored  human  perfonnance 
controlling  map  objects  with  different  levels  of  difficulty,  such  as  static  (nonmoving)  versus 
dynamic  (moving)  touch  screen  targets  (also  referred  to  as  map  objects).  In  Oviatt’s  research,  all 
map  objects  were  static  and  none  were  differentiated  by  size.  In  addition,  emphasis  on 
participant  response  time  and  accuracy  can  have  an  effect  on  input  time  and  accuracy.  If 
response  accuracy  was  held  constant  (i.e.,  by  emphasizing  accurate  responses),  the  time  taken  to 
touch  a  relatively  small  target  (e.g.,  an  intersection)  should  be  longer  than  that  involved  in 
touching  a  relatively  larger  target  (e.g.,  anywhere  on  a  road)  (6).  Similarly,  the  time  taken  to 
touch  a  moving  target  should  be  longer  than  that  required  to  touch  a  static  target  of  the  same  size. 
If  response  time  were  held  relatively  constant  (i.e.,  by  emphasizing  fast  response  time),  accuracy 
of  touch  response  for  moving  targets  and  smaller  stationary  targets  should  be  less  than  that  for 
larger  stationary  targets.  Because  swarm  displays,  and  military  displays  in  general,  can  include 
moving  elements  (i.e.,  swarm  members,  robots,  and  military  vehicles),  research  should  explore 
the  effect  of  targets  of  increasing  level  of  difficulty  (large  static  targets,  small  static  targets,  and 
small  moving  targets)  on  temporal  binding  of  speech  and  touch  commands,  with  special  care 
given  to  emphasis  on  response  time  and  accuracy. 

The  third  issue  is  the  need  for  relevant  speech  commands.  There  is  no  multimodal  research 
involving  constrained  (limited  or  controlled)  speech  commands.  Military  speech  recognition 
command  grammars  often  use  a  limited  vocabulary  with  short  words  and  phrases,  because  this 
approach  has  been  shown  to  work  better  with  current  speech  recognition  technology  (7). 
However,  Oviatt’s  research  used  unconstrained  natural  language  commands,  in  which 
participants  could  use  multi-word  and  multi-sentence  commands  of  their  own  choosing. 
Multimodal  control  research  for  military  and  swarm  environments  should  involve  constrained 
speech  commands,  controlling  for  number  and  type  of  words.  The  use  of  a  smaller,  limited 
vocabulary  would  potentially  decrease  the  length  of  temporal  binding  time  needed  for 
multimodal  commands. 

The  approach  for  presenting  combined  multimodal  controls  to  explore  the  issues  described 
previously  is  shown  in  table  1 .  Speech  commands  were  used  to  specify  targets  to  emplace  on  the 
map  (i.e.,  hot  spots,  targets,  or  waypoints),  and  touch  was  used  to  define  the  location  of  the  map 
object  (i.e.,  road,  intersection,  or  leading  or  lagging  swarm  edge).  Speech  commands  could  also 
include  the  spatial  word  “here”  (i.e.,  “hot  spot”  or  “hot  spot  here”).  Speech  and  touch  commands 
could  be  used  in  any  order,  but  both  had  to  be  used  to  complete  the  sequential  set  of  commands. 
We  hypothesized  that  both  the  type  of  map  object  and  type  of  speech  command  would  affect  the 
inter-command  time  (temporal  binding)  of  speech  and  touch  commands. 
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Table  1.  Examples  of  map  tasks  and  participant  responses. 


Message  from  Headquarters 

Participant  Response 

“Put  a  hot  spot  anywhere  on  1st.” 

In  any  order,  Say  “hot  spot”  or 
“hot  spot  here”  (depending  on 
condition).  Touch  screen 
anywhere  on  1st  Street. 

“Put  a  waypoint  at  the 
intersection  of  2nd  and  Bravo.” 

In  any  order,  Say  “waypoint”  or 
“waypoint  here”  (depending  on 
condition).  Touch  screen  at  the 
intersection  of  2nd  and  Bravo. 

“Put  a  target  anywhere  at  the 
lagging  edge  of  the  swarm.” 

In  any  order,  Say  “target”  or 
“target  here”  (depending  on 
condition).  Touch  screen 
anywhere  at  the  lagging  edge  of 
the  swarm. 

2.3  Swarm  Simulation  Study 

The  ARL  Vehicle  Technology  Directorate  (VTD)  conducted  a  Year  2  simulation  study  to 
investigate  the  effectiveness  of  the  metacognitive  perfonnance  measures.  In  the  experimental 
trials,  the  convoy  of  vehicles  followed  a  specified  path  on  the  road  network  accompanied  by  a 
swarm  of  vehicles,  consisting  of  the  “sentry”  team  and  the  “explorer”  team.  The  independent 
variables  were  the  number  of  hot  spots  and  the  swarm  attrition  rate.  To  control  the  number  of 
independent  variables  in  the  experiment,  the  locations  of  the  hot  spots  were  specified  for  each 
experimental  trial. 

2.4  Swarm  Interface  Study 

A  Year  2  laboratory  study  was  conducted  at  the  Human  Research  and  Engineering  Directorate  to 
evaluate  the  multimodal  swarm  control  interface.  The  independent  variables  were  type  of  map 
object,  and  command  type.  Types  of  map  objects  were  (1)  swarm  leading  or  lagging  edges 
(moving  objects  that  needed  one  bit  of  spatial  infonnation  to  locate),  (2)  map  intersections 
(stationary  objects  that  needed  two  bits  of  spatial  infonnation  to  locate),  and  (3)  map  roads 
(stationary  objects  that  needed  one  bit  of  spatial  information  to  locate).  Two  types  of  speech 
commands  were  used:  (1)  a  choice  command  in  which  one  of  three  different  one- word 
commands  (“target,”  “hot  spot,”  or  “waypoint”)  were  spoken  and  (2)  a  choice  command  to 
which  an  additional  spatial  word  “here”  was  added  (i.e.,  “target  here,”  “hot  spot  here,”  or 
“waypoint  here”).  Examples  of  speech  and  touch  commands  are  shown  in  table  1 .  Participant 
preference  (Preferred  Modality)  in  using  touch  or  speech  first  when  inputting  each  command  was 
also  recorded.  Dependent  variables  included  inter-command  temporal  binding  time  (the 
difference  in  time  between  the  onset  of  the  participant’s  first  audio  or  touch  command,  and  the 
onset  of  the  second  command),  and  the  proportion  of  correct  speech  and  touch  commands. 
Dependent  variables  also  included  the  length  of  time  between  the  start  of  the  control  message 
and  a  simultaneous  alerting  tone  (stimulus),  and  the  resulting  speech  and  touch  commands 
(stimulus  to  onset  of  speech,  and  stimulus  to  onset  of  touch  command  times).  As  recommended 
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by  human  factors  guidelines,  onset  of  touch  command  time  was  defined  as  being  that  point  in 
time  when  the  participant  removed  his  finger  from  the  touchscreen  (7). 

A  total  of  12  male  Marines  with  a  mean  age  of  19  years  from  the  Marine  Detachment  at 
Aberdeen  Proving  Ground,  MD,  acted  as  volunteer  participants.  All  had  normal  hearing  and 
normal  color  vision.  For  each  experimental  condition,  one  Marine  was  seated  in  front  of  the 
touch  screen  and  performed  the  swarm  control  tasks,  using  the  command  type  assigned  to  that 
condition.  Each  Marine  was  instructed  to  respond  as  quickly  and  accurately  as  he  could  when  the 
visual  control  message  and  simultaneous  alerting  tone  were  presented  on  the  swarm  display 
interface.  A  photograph  of  a  Marine  participant  with  the  interactive  map  display  is  shown  in 
figure  2.  Each  Marine  performed  one  30-min  experimental  condition  for  each  command  type. 

At  the  end  of  the  second  and  final  condition,  they  filled  out  a  final  questionnaire  asking  their 
opinion  of  the  speech  and  touch  interfaces. 


Figure  2.  Marine  participant  with  the 
interactive  map  display. 


3.  Results 


3.1  Swarm  Simulation  Study 

Results  indicated  that  the  swarm  could  maintain  coverage  most  of  the  time  for  the  cases  studied. 
In  our  experimental  trials,  the  swarm  responded  to  0,  1,  or  2  hot  spots.  In  the  case  of  0  hot  spots, 
coverage  problems  were  the  result  of  attrition.  The  swarm  was  able  to  compensate  for  loss  of 
members  (by  speeding  up)  as  long  as  the  total  number  of  swarm  member  was  greater  than  10. 
For  the  cases  of  1  or  2  hot  spots,  the  swarm’s  coverage  problems  were  the  result  of  geographic 
dispersion  as  well  as  attrition.  By  changing  the  team  designation  for  some  of  the  explorers,  it 
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was  possible  to  maintain  coverage  for  the  convoy.  In  some  experimental  trials,  we  noticed  an 
issue  with  the  team  change  strategy — our  algorithm  did  not  consider  geographic  location  as  a 
parameter  for  the  team  change.  Consequently,  in  some  cases,  explorers  near  the  convoy  changed 
to  sentries.  To  alter  the  overall  convoy  coverage,  we  need  to  be  able  to  recall  explorers  from  the 
hot  spots.  We  plan  to  address  this  issue  in  our  future  work. 

3.2  Control  Interface  Study 

Results  indicate  that  less  than  2%  of  the  total  speech  commands  were  incorrectly  uttered  (i.e.,  the 
participant  saying  “target”  instead  of  “waypoint”).  Results  also  indicated  that  participants  did 
not  use  speech  or  touch  first  exclusively  when  issuing  commands.  Across  participants,  76.7%  of 
commands  were  speech  first,  while  23.3%  of  commands  were  touch  first.  Figure  3  shows  the 
number  of  touch  and  speech  responses  for  each  participant,  while  figure  4  shows  mean  temporal 
binding  response  times  (the  difference  in  time  between  the  onset  of  the  participant’s  first  speech 
or  touch  command,  and  the  onset  of  the  second  command)  for  each  participant,  where  positive 
values  denote  speech  responses  before  touch,  and  negative  values  denote  touch  responses  before 
speech.  The  data  indicate  that  7  participants  out  of  12  (participants  1,  2,  3,  5,  7,  11,  and  12)  used 
speech  before  touch  almost  exclusively  (using  speech  first  95%  or  more  of  the  time, 
approximately  95  commands  out  of  100).  Two  participants  (9  and  10)  used  speech  first  88%  and 
73%  of  the  time,  respectively.  Two  participants  (4  and  8)  used  touch-first  commands 
exclusively,  97%  or  more  of  the  time,  while  the  remaining  participant  (6)  used  touch-first 
commands  73%  of  the  time.  This  between-  and  within-participant  variability  in  the  use  of 
command  modality  should  be  further  explored  in  future  research.  Knowledge  of  command 
variability  should  be  valuable  in  the  design  of  future  speech/touch  systems,  to  help  support  a 
smoother  fusion  of  user  commands  and  to  reduce  system  error. 


Figure  3.  Number  of  touch  and  speech  responses  for  each  participant. 
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Spatial  Word 
O  Spatial  Word  Included 
O  No  Spatial  Word 

I  Spatial  Word  Included 
I  No  Spatial  Word 


Subjects  Ordered  by  Mean  Response  Time,  Low 
to  High 


Error  Bars:  +/-  2  SD 


Figure  4.  Mean  temporal  binding  times  with  error  bars  for 

individual  participants,  with  participants  ordered  from 
low  to  high  mean  response  times. 

For  temporal  binding  (the  difference  in  time  between  the  onset  of  the  participant’s  first  speech  or 
touch  command,  and  the  onset  of  the  second  command),  a  linear  mixed  model  analysis  of 
variance  (ANOVA)  with  a  post-hoc  Bonferroni  analysis  indicated  significant  (p  <  0.01) 
interactions  for  preferred  modality  x  map  object  and  for  preferred  modality  x  command  type, 
and  included  significant  main  effects  for  preferred  modality. 

Post-hoc  results  for  the  preferred  modality  x command  type  interaction  (figure  5)  indicated  that 
temporal  binding  time  was  significantly  greater  for  speech-first  commands  with  spatial  words 
than  for  touch-first  commands  with  and  without  spatial  words  (p  <  0.001).  Data  analysis 
indicated  that  longer  temporal  binding  times  for  speech-first  commands  occurred  because 
participants  who  input  speech  first  often  waited  until  their  command  was  completely  uttered 
before  touching  the  screen,  while  participants  who  touched  first  did  not  always  wait  until  their 
input  was  complete  before  uttering  their  speech  commands.  Future  research  should  further 
investigate  the  effect  speed  and  accuracy  of  individual  differences  in  user  command  preferences 
on  temporal  binding  time. 


8 


Figure  5.  Mean  temporal  binding  times  for  command  type  x 

preferred  modality  interaction,  with  95%  conf.  intervals. 


Results  for  the  preferred  modality  x  map  object  interaction  (figure  6)  indicated  that  temporal 
binding  was  significantly  greater  (p  <  0.001)  for  speech-first  than  for  touch-first  commands, 
across  all  map  objects.  For  speech-first  commands,  temporal  binding  times  for  intersections 
were  significantly  greater  than  those  for  roads  or  swarm  edges,  with  no  significant  difference 
between  roads  and  swarm  edges.  For  touch-first  commands,  there  were  no  significant  temporal 
binding  differences  between  any  map  objects. 


Map  Object 

□  Road 

□  Intersection 

□  Sw  arm  Edge 


Error  Bars:  95%  Cl 


Figure  6.  Mean  temporal  binding  times  for  map  object  x  preferred  modality 
interaction,  with  95%  conf.  intervals. 
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Results  for  the  preferred  modality  main  effect  indicated  that  the  mean  temporal  binding  time  was 
significantly  greater  (p  <  0.001)  for  speech-first  commands  (mean  time  1.285  s,  s.d.  0.788)  than 
for  touch-first  commands  (mean  time  0.839  s,  s.d.  0.421).  Thus,  the  difference  in  temporal 
binding  time  between  speech-first  and  touch-first  commands  was  0.446  s.  Again,  data  analysis 
showed  that  this  occurred  because  participants  who  input  speech  first  often  waited  until  their 
command  was  completely  uttered  before  touching  the  screen,  while  participants  who  touched 
first  did  not  always  wait  until  their  input  was  complete  before  uttering  their  speech  commands. 
Future  research  should  examine  individual  differences  in  touch  and  speech  command  output. 

ANOVA  and  Bonferroni  analyses  of  stimulus  to  touch  and  stimulus  to  speech  response  times 
showed  significant  differences  due  to  map  object  type.  Command  inputs  using  intersections 
showed  significantly  greater  response  times  {p  <  0.001)  than  for  roads  or  swarm  edges.  There 
was  no  significant  difference  between  roads  and  swarm  edges.  As  can  be  seen  in  table  2, 
intersections  had  input  response  times  approximately  0.7  s  (stimulus  to  touch)  to  0.9  s  (stimulus 
to  speech)  greater  than  roads  or  swarm  edges.  The  results  indicated  that  tasks  involving 
intersections  provided  greater  response  times  than  tasks  involving  swarm  edges  and  roads.  The 
comparatively  short  mean  response  times  for  moving  swarm  edges  could  have  been  due  to  the 
slow  (one  update/second)  screen  update  rate,  which  may  have  resulted  in  rate  of  movement  of 
the  swarm  being  slow  enough  to  reduce  the  level  of  task  difficulty.  Further  research  should 
involve  faster  swarm  update  rates,  and  an  investigation  of  any  potential  time/accuracy  tradeoff 
involved  in  performing  this  task. 


Table  2.  Mean  response  times  and  standard  deviations  for  map  objects  and  stimulus 
to  speech  and  stimulus  to  touch  measures. 


Map 

Objects 

Swarm  Edges 

Measure 

Roads 

Intersections 

Stimulus  to  Speech 

3.250  (1.423) 

3.967  (1.748) 

3.264  (1.145) 

Stimulus  to  Touch 

3.867  (1.134) 

4.852  (1.328) 

3.885  (1.300) 

On  their  final  questionnaires,  Marines  commented  that  the  multimodal  controls  were  fast,  simple 
to  use,  and  very  helpful.  One  Marine  commented  that  controls  of  this  type  might  also  extend 
beyond  swarms  as  a  useful  display  for  Squad  personal  digital  assistant  (PDA)  interfaces  for  use 
in  providing  information  regarding  IEDs  and  targets. 
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4.  Conclusions 


In  Year  1,  we  successfully  defined  a  40-member  simulated  swarm  to  accompany  a  4-member 
convoy,  and  successfully  developed  metacognition  algorithms  that  enabled  swarm  members  to 
efficiently  monitor  changes  in  swarm  status  as  they  executed  6  different  convoy  missions.  We 
also  successfully  designed  a  human-swarm  display  interface  that  allowed  Marines  to  efficiently 
interact  with  a  robotic  swarm  participating  in  a  representative  convoy  mission. 

In  Year  2,  we  successfully  extended  the  metacognition  algorithms  to  enable  heterogeneous 
swarm  members  to  more  efficiently  monitor  changes  in  swarm  status,  and  developed  novel 
control  interfaces  that  would  allowed  Marines  to  control  or  modify  the  swarm’s  mission  by 
placement  of  targets,  hot  spots,  and  waypoints.  Research  results  indicated  that  for  interactions 
between  preferred  modality  with  command  type,  and  preferred  modality  with  map  object, 
temporal  binding  time  was  significantly  greater  for  speech-first  than  for  touch-first  commands. 
This  indicates  that  individual  differences  (in  this  case,  user  preference  for  speech  or  touch 
commands  first)  can  have  an  effect  on  user  performance  with  a  speech  and  touch  display.  Future 
research  should  further  investigate  the  effect  of  individual  differences  in  preferences  of 
command  input  on  inter-command  speed  and  accuracy. 

Comments  by  Marines  in  our  Year  1  and  Year  2  experiments  indicated  that  multimodal  displays 
and  controls  permitted  them  to  act  as  efficient  and  effective  swann  supervisors.  Elements  of  our 
completed  research  (i.e.,  our  observations  regarding  the  limitations  of  our  metacognition 
algorithms  and  Marine  suggestions  regarding  swarm  displays  and  controls)  served  as  a  basis 
from  which  to  transition  our  work  (see  section  6). 
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6.  Transitions 


Our  simulation  work  will  be  used  to  support  VTD  studies  of  elevation  vectors  of  heterogeneous 
(ground  vehicle  and  helicopter)  swarms  currently  being  conducted  by  ARL  and  researchers  from 
the  University  of  Texas  in  Arlington,  TX.  In  addition,  the  Micro  Autonomous  Systems  and 
Technology  (MAST)  Collaborative  Technology  Alliance  (CTA)  has  shown  interest  in  the 
algorithms  developed  for  the  simulated  swann,  and  the  Safe  Operations  of  Unmanned  Systems 
for  Reconnaissance  in  Complex  Environments  (SOURCE)  Army  Technology  Objective  (ATO) 
has  shown  interest  in  the  control  interface.  In  the  first  year  of  this  research,  papers  were 
published  at  several  international  conferences,  including  the  International  Conference  on 
Intelligent  Robots  and  Systems  (IROS)  and  the  Human  Factors  and  Ergonomics  Society  (HFES). 
Second-year  research  papers  are  being  prepared  for  these  conferences. 

In  Year  2,  we  performed  additional  work  beyond  that  stated  within  the  goals  and  objectives  of 
the  Director’s  Research  Initiative  (DRI),  by  expanding  the  Year  1  display  interface.  We  replaced 
the  three-dimensional  (3-D)  view  with  a  more  realistic  two-dimensional  (2-D)  map  with  icons 
for  the  swann  members  and  convoy  vehicles.  Conditions  such  as  inadequate  swann  coverage 
now  cause  visual  alerts  to  display  messages  that  identify  coverage  or  team  size  problems  and 
state  the  conective  action  that  the  swarm  used  to  mitigate  the  problem.  The  expanded  interface 
contains  controls  that  allow  the  user  to  adjust  the  ratio  of  members  in  the  swann  explorer  and 
sentry  teams.  Due  to  time  limitations,  this  interface  was  not  tested.  However,  the  SOURCE 
ATO  has  shown  interest  in  performing  continuing  research  using  this  display  interface. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


2-D 

two-dimensional 

3-D 

three-dimensional 

ANOVA 

analysis  of  variance 

ARL 

U.S.  Army  Research  Laboratory 

ATO 

Army  Technology  Objective 

CTA 

Collaborative  Technology  Alliance 

DRI 

Director’s  Research  Initiative 

FY09 

fiscal  year  2009 

HFES 

Human  Factors  and  Ergonomics  Society 

HRI 

human-robot  interaction 

IED 

improvised  explosive  device 

IROS 

Intelligent  Robots  and  Systems 

MAST 

Micro  Autonomous  Systems  and  Technology 

PDA 

personal  digital  assistant 

SOURCE 

Safe  Operations  of  Unmanned  Systems  for  Reconnaissance  in  Complex 
Environments 

VTD 

Vehicle  Technology  Directorate 
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